Federated learning seeks to address the issue of isolated data islands by making clients disclose only their local training models. However, it was demonstrated that private information could still be inferred by analyzing local model parameters, such as deep neural network model weights. Recently, differential privacy has been applied to federated learning to protect data privacy, but the noise added may degrade the learning performance much. Typically, in previous work, training parameters were clipped equally and noises were added uniformly. The heterogeneity and convergence of training parameters were simply not considered. In this paper, we propose a differentially private scheme for federated learning with adaptive noise (Adap DP-FL). Specifically, due to the gradient heterogeneity, we conduct adaptive gradient clipping for different clients and different rounds; due to the gradient convergence, we add decreasing noises accordingly. Extensive experiments on real-world datasets demonstrate that our Adap DP-FL outperforms previous methods significantly.
translated by 谷歌翻译
While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. In addition, when given a low computational cost (2.5\%), the framework outperforms the original monolingual language models in six out of seven tested languages. We release language models in 50 languages translated from English and the source code here.
translated by 谷歌翻译
When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of $\sim$1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Deep Neural Networks are vulnerable to adversarial attacks. Among many defense strategies, adversarial training with untargeted attacks is one of the most effective methods. Theoretically, adversarial perturbation in untargeted attacks can be added along arbitrary directions and the predicted labels of untargeted attacks should be unpredictable. However, we find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs become virtual targets of each other. This study investigates the impact of such closely-coupled classes on adversarial attacks and develops a self-paced reweighting strategy in adversarial training accordingly. Specifically, we propose to upweight hard-class pair losses in model optimization, which prompts learning discriminative features from hard classes. We further incorporate a term to quantify hard-class pair consistency in adversarial training, which greatly boosts model robustness. Extensive experiments show that the proposed adversarial training method achieves superior robustness performance over state-of-the-art defenses against a wide range of adversarial attacks.
translated by 谷歌翻译
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.
translated by 谷歌翻译
Data mixing strategies (e.g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs). They mix two images as inputs for training and assign them with a mixed label with the same ratio. While they are shown effective for vision transformers (ViTs), we identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies. We empirically observe that the contributions of input tokens fluctuate as forward propagating, which might induce a different mixing ratio in the output tokens. The training target computed by the original data mixing strategy can thus be inaccurate, resulting in less effective training. To address this, we propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token. We reuse the computed attention at each layer for efficient token-label alignment, introducing only negligible additional training costs. Extensive experiments demonstrate that our method improves the performance of ViTs on image classification, semantic segmentation, objective detection, and transfer learning tasks. Code is available at: https://github.com/Euphoria16/TL-Align.
translated by 谷歌翻译
在人工智能(AI),机器学习(ML)以及更具体地说,深度学习(DL)的核心已经取得了巨大的成功。但是,由于缺少培训ML或DL模型中缺少类是看不见的,看不见的类标签预测的探索程度要小得多。在这项工作中,我们提出了一个模糊的推理系统,通过与基于曲率的特征选择(CFS)方法结合使用TSK+模糊推理引擎来应对这一挑战。通过预测物联网(IoT)领域内的网络设备的定位标签,已经评估了我们系统的实际可行性。竞争性预测性能证实了我们系统的效率和功效,尤其是在模型训练阶段看不见的大量连续类标签时。
translated by 谷歌翻译
尽管变形金刚及其变体构象体在语音识别方面表现出了有希望的表现,但参数化的属性在训练和推理过程中导致了很大的记忆成本。一些作品使用跨层重量分享来减少模型的参数。但是,不可避免的能力损失会损害模型性能。为了解决这个问题,本文提出了通过共享稀疏门控专家的参数效率构象异构体。具体而言,我们使用稀疏门控的专家(MOE)来扩展构型块的容量而不增加计算。然后,共享分组构象块的参数,以减少参数的数量。接下来,为了确保具有不同级别适应表示的灵活性的共享块,我们会单独设计MOE路由器和标准化。此外,我们使用知识蒸馏来进一步提高性能。实验结果表明,与全参数模型相比,所提出的模型用编码器的1/3来实现竞争性能。
translated by 谷歌翻译
知识图(kg)以其大规模和知识推断能力而闻名,但也因与之相关的不完整而臭名昭著。由于关系长尾分布在公斤中的长尾分布,因此很少有人提出完成kg的完成,以减轻不完整和扩大kg的覆盖范围。它旨在对涉及新关系的三胞胎进行预测,当时仅提供少量培训三胞胎作为参考。以前的方法主要集中在设计本地邻居聚合器以学习实体级信息和/或在三胞胎级别实现顺序依赖性假设以学习元关系信息。但是,对于学习几乎没有射击关系的元表示,很大程度上忽略了宝贵的成对三重级交互和上下文级别的关系信息。在本文中,我们提出了一种分层的关系学习方法(雇用),以完成几次kg完成。通过共同捕获三个级别的关系信息(实体级别,三胞胎级别和上下文级别),雇用可以有效地学习和完善几乎没有射击关系的元表示,因此可以很好地推广到新的看不见的关系。在两个基准数据集上进行的广泛实验验证了雇用与其他最先进方法的优势。
translated by 谷歌翻译